PON parameterization
Introduction This is additional infomation to the PON parameterization as described by http://www.teokem.lu.se/~ulf/Methods/ponparm.html The method can be found in: "Automated Molecular Mechanics Parameterization with Simultaneous Utilization of Experimental and Quantum Chemical Data", Per-Ola Norrby and Tommy Liljefors, J. Comput. Chem. 1998, 19, 1146-1166. New implementation for Amber (Dec. 2006): P. Rydberg, L. Olsen, P.-O. Norrby and U. Ryde "A general transition-state force field for cytochrome P450 hydroxylation", J. Chem. Theory Comput. 2007, 3, 1765-1773 The example used is a parameterization carried out for the active site of Fe-hydrogenase (Hedegård, Ryde, In prep) Preliminary Setup The first many steps (1. to 11.) do not involve any parameter optimization but serve as preparation of the relevant files (obtaining a QM Hessian, and preparing amber input files with correct atom types etc.). 1. Structure and frequencies with Gaussian or Turbomole and calculate RESP charges (Gaussian) * Structure and Frequencies (Gaussian or Turbomole) With turbomole, optimize the structure as usual with jobex -backup -c 200 -ri From this structure, insert: $dipgrad file=hessian $hessian file=hessian $vibrational normal modes file=hessian $maxcor (70% of available memory is recommended)> and then use aoforce -backup -c 200 -ri Further infomation on Turbomole in Lund can be found at http://www.teokem.lu.se/~ulf/Methods/turbomole.html * RESP charges is best done with Gaussian (make an example input file). Usually one can use the Merz-Kollmann (MK) scheme (see http://www.teokem.lu.se/~ulf/Methods/resp.html). * Construct pdb (from turbomole, use turbotopdb > pdb or mimic > pdb - the latter is just a Lund alias) * Use antechamber (see Link above - the link also contain information about more tricky cases such as polarizable force fields, or if there are problems with antechamber). If antechamber works, many of the steps below can be skipped and amberfiles res.in and res.dat will be genereted with correct M, B, E. 3 symbols). 2. Amber input files with antechamber or define * Use antechamber (see Link above - the link also contain information about more tricky cases such as polarizable force fields, or if there are problems with antechamber). If antechamber works, many of the steps below can be skipped and amberfiles res.in and res.dat will be genereted with correct M, B, E. 3 symbols). * 2. In case antechamber does not work, we use the program describe to build an amber topology file, typically called residue.in e.g. feh.in in our case (do not renumber the atoms if not necessary). Describe extracts force constants of bonds, angles and dihedrals from the Hessian using the method desribed by: J. M. Seminario, Int. J. Quantum Chem. 1996, 60, 1271-1277. Describe: Note that we here only use describe to get parameters (start-guesses) and amber files (.in and .dat). Describe will also do several things in addition concerning the geometry - this is can be skipped. # Run describe and follow instructions (see also comments below) # Select input (turbomole here) # Select scaling (for B3LYP, 1 is generally fine) # The next things can be skipped for PON parametrization (geometry related) # construct atmtyp file (press enter). Note: When we wish to reparameterize an active site, we will make a unique atomtype for each individual atom (and thus optimize parameters for all atoms). Methyl group hydrogens can, however, get the same atom types. # The atmtype file will have the atom names from turbomole. Modify the atmtyp file (see template below), and save a copy of old res.in and res.dat files and rerun describe the same way as above but this time read in the atmtyp. # Check the res.in file: #* Add RESP charges if missing #* Relabel atom names if necessary and fix M, S, B, 3 and E labels. #* Note: M, S, B, 3 and E are most likely not correct for when constructed for metal centers by describe (in describe, default is M and heavy atoms and E on hydrogens). Therefore the file must be modified. M = Main chain, E = End atom, S = Side chain,3 = Usually for C in methyl groups). There is a short tutorial on these labels at http://ambermd.org/doc/prep.html Usually this will also mean that one has to relabel the atoms in the res.in file #* Add LOOPS if missing (see example res.in file below) - this should be done for rings #* Add IMPROPER if missing (see example res.in file below). Determine whether rings are flat (see phenyl alanine's .in file for an example or see below for feh.in) atmtyp file (template below, taken from part of the active site of FeH2ase) Parameters for Fe-H2ase active site, atom types unique for all atoms except CH3 and CH2, 7/1-16 # Title followed by blank line FEH # residue name HA Hc # atom_name (use the name from the pdb), atmom _type CB Cc HB2 Hc HB3 Hc SG Sc #IMPORTANT: The atomtype must NOT coincide with existing atomtypes in Amber - this will lead to distaster! res.in file example 0 0 2 feh.dat Parameters for Fe-H2ase active site, atom types unique for all atoms except CH3 feh.dat FEH INT 1 CHANGE OMIT DU BEG 0.00000 1 DUMM DU M 0.0000 0.0000 .0000 .0000 2 DUMM DU M 0.0000 1.0000 .0000 .0000 3 DUMM DU M 1.0000 1.0000 .0000 .0000 4 HA Hc M 8.1434 -2.8437 21.6027 0.086770 5 CB Cc M 7.6796 -3.8276 21.7690 -0.127063 6 HB2 Hc E 8.4609 -4.5998 21.6968 0.060572 7 HB3 Hc E 6.9388 -4.0010 20.9739 0.060572 8 SG Sc M 6.9326 -3.8194 23.4452 -0.447847 9 FE Fe M 6.1257 -5.9659 24.0301 0.501975 10 CO1 Co S 5.4501 -7.5149 24.6158 0.059706 11 OC1 Oc E 5.0305 -8.5111 25.0039 -0.188459 12 CO2 cO S 6.0879 -6.4412 22.3144 0.189272 13 OC2 oC E 6.0468 -6.7207 21.2028 -0.218542 14 OW Ow B 4.1589 -4.8577 23.6759 -0.688580 15 HW1 hW E 3.7597 -5.1557 22.8461 0.384445 16 HW2 Hw E 4.7340 -4.0969 23.4131 0.286564 17 C8 C8 M 7.9473 -6.5351 24.1731 0.278762 18 O8 o8 E 8.5986 -7.1974 23.3982 -0.472728 19 C7 C7 M 8.6508 -5.9355 25.4082 -0.067468 20 H71 h7 E 9.3054 -6.7007 25.8607 0.092807 21 H72 h7 E 9.3150 -5.1425 25.0213 0.092807 22 C6 C6 M 7.6916 -5.3643 26.4039 -0.207512 23 C5 C5 S 8.1095 -4.9145 27.6584 0.091602 24 C5M C9 3 9.5606 -4.9479 28.0657 -0.109615 25 HM1 h6 E 9.9441 -5.9815 28.1061 0.049705 26 HM2 h6 E 10.1904 -4.4030 27.3435 0.049705 27 HM3 h6 E 9.7038 -4.4957 29.0548 0.049705 28 N1 Np M 6.3955 -5.3413 26.0071 -0.027346 29 C2 C2 M 5.4611 -4.9164 26.8651 0.226877 30 O2 o2 S 4.1972 -5.0319 26.4304 -0.420783 31 H2 h8 E 3.5798 -4.7157 27.1012 0.387308 32 C3 C3 M 5.7633 -4.4030 28.1404 -0.107862 33 C3M C1 3 4.6777 -3.8845 29.0533 -0.158906 34 H31 h9 E 5.0438 -3.0787 29.7127 0.066277 35 H32 h9 E 3.8529 -3.4194 28.4873 0.066277 36 H33 h9 E 4.2393 -4.6670 29.6993 0.066277 37 C4 C4 M 7.1145 -4.4222 28.5276 0.139582 38 O3P o3 M 7.5112 -3.9809 29.7415 -0.446029 39 H3P h0 E 6.7491 -3.7001 30.2644 0.401176 LOOP C4 C5 IMPROPER C2 C6 N1 FE C3 N1 C2 O2 C4 C2 C3 C3M C5 C3 C4 O3P C6 C4 C5 C5M N1 C5 C6 C7 DONE STOP 3. Construct prmtop and prmcrd files * Copy .dat file to frcmod.hemts (cp feh.dat frcmod.hemts) * Check non-bonded parameters in the frcmod.hemts (.dat) file: ALL hydrogens should have vdW parameters - parameters for sp2 and sp3 carbons should be different (see Amber force field for "inspiration") * run tleap -f leap_com (input: A pdb file for the residue with RESP chages, the res.in file and the res.dat (here called "frcmod.hemts"). A sample leap_com input is given below * Check parameters with changeparm *# use changeparm (read prmtop and the pdb file in, i.e. here feh.pdb?) *# Run command e (energies should be close to zero - changeparm will suggest new values, right? *# Use new values and run changeparm again leap_com input sample Sample taken from FeH2ase source leaprc.ff14SB # Use Amber 14 force field addAtomTypes { # Add undefined atom types to prevent errors in tleap {"Hc" "H" "sp3"} {"Hw" "H" "sp3"} {"hW" "H" "sp3"} {"h6" "H" "sp3"} {"h7" "H" "sp3"} {"h8" "H" "sp3"} {"h9" "H" "sp3"} {"h0" "H" "sp3"} {"C1" "C" "sp3"} {"C7" "C" "sp3"} {"C9" "C" "sp3"} {"Cc" "C" "sp3"} {"C2" "C" "sp2"} {"C3" "C" "sp2"} {"C4" "C" "sp2"} {"C5" "C" "sp2"} {"C6" "C" "sp2"} {"C8" "C" "sp2"} {"Co" "C" "sp2"} {"cO" "C" "sp2"} {"Np" "N" "sp2"} {"o2" "O" "sp3"} {"o3" "O" "sp3"} {"o8" "O" "sp2"} {"Ow" "O" "sp2"} {"oC" "O" "sp2"} {"Oc" "O" "sp2"} {"Sc" "O" "sp3"} {"Fe" "Fe" "sp3"} } loadAmberPrep feh.in loadAmberParams frcmod.hemts x=loadpdb feh.pdb saveamberparm x prmtop prmcrd quit 4. Construct the file hessian (this is the MM hessian) * Save a copy of the QM hessian from turbomole (cp hessian hessian-qm) * Construct the input files nmodemm.in (http://www.teokem.lu.se/~ulf/Methods/ponparm.html#Sample_nmodemm.in_file) and nmode.in (http://www.teokem.lu.se/~ulf/Methods/ponparm.html#Sample_nmode.in_file). Note that for writing out the hessian a modified version of AMBER is needed (the modifications needed are shown here http://www.teokem.lu.se/~ulf/Methods/ponparm.html#Modifications_of_Amber_nmode). Test that the inputs are constructed correctly with: * Run nmode -O -i nmodemm.in -o nmodemm.out -p prmtop -c prmcrd -r mdrest * Run nmodeh -O -i nmode.in -o nmode.out -p prmtop -c mdrest -r temp An MM hessian file (called "hessian" should now be constructed). PON Programs 1. Make PON input * run mkpar > paropt the mkpar script uses the *dat file to construct the input in the right format for the PON program (CalcLoop). paropt contains 5 columns: # the line in the *dat file # columns in .dat file () # the current value # step factor # format The "values" are e.g. force constants and equilibrium distances/angles for bonded and angular parameters. For dihedrals, only the force constants are optimized. * Check paropt - remove parameters that should not be fitted (we did not remove any - only those already zeroth in the frcmod.hemts/*dat file 2. Make D1 and scaling files (contains Hessian to be read by PON programs ) * run GetHess_t turbomole.out (gives D1.cal and scaling - if D1 exists, save a copy D1-orig) 3. Make list of dihedrals (large i.e. > 150 degrees should be removed from penalty function) * run PonInit 4. Run PON CalcLoop (can take a while, so submit it to the queue) * NOTE: The input in frcmod.hemts is heavily formatted. All commas must be aligned! * Check the paropt file for "***" This is an format error that sometimes occur! * run CalcLoop (output Loop.log) * Check convergence of penalty function with poncheck.sh (should be around 0.25*#data_points) *Check results and possibly re-run CalcLoop 5. Analysis and further parameter optimization after first run It is normally not done just after one iteration. * run ponresult > pon.results. Look for (follows roughly the order of pon.results) *# The penalty function value i.e. Chi^2 in Norrby & Liljefors (1998) *# Number of negative second derivaties *# Negative JTJ elements; J = Jacobian matrix, I guess -see Eqs. 6 and 7 in Norrby & Liljefors (1998) *# Cases where the first derivative > second derivative: As quoted from Norrby & Liljefors 1998): "When close to an optimum value parameters set, the second derivatives should be positive and large compared to the first derivates" - When the opposite is the case, the corresponding parameter is probably bad... *# # of zeroth force constants for - divide them into 1) bonds (bc), 2) angles (a) and dihedrals (dc) *# # Poorly determinated parameters *# r^2 (Pearson correlation coefficient) *# MAD *# Number of elements that differ > 100 between QM and MM hessian *# RMS contributions to bonds ("distances and energy"), "angles and energy", dihedrals (w also non-zero fc) * Change parameters that needs change in frcmod.hemts (e.g. angles with force constants < 10 and equilibrium angles > 120 should be set large/smaller * Make copy of paropt (to paropt.old) and run ponmkpar. Run now CalcLoop again * NOTE: it can be necessary to re-run mkpar (MkPar > paropt) in between CalcLoop optimizations. (e.g. sometimes step-values are set to zero!). Therefore: Always carefully check paropt before starting the calculation! * CalcLoopHC is buggy: Do not run this. * When to stop: Low penalty function and no negative second derivatives and no zeroed bond or angle force constants is typically a good sign! 6. Merging PON with amber The finished force field needs to be merged with the amber force fields. This is achieved by runnining tleap -s -f leap.in A sample leap file is given below with examples. * Atomtypes (as given in the xxx files) have to be defined in the amber style * Make and new "junction.dat" file: Around the junctions to the active site, amber will not be able to locate even standard paramters (bonds, angles and dihedrals), since parameters the standard amino acid atomtypes (as defined in amber) and the new PON atomtypes are not defined. The paramters can be adapted from the standard ones, but applying the PON atomtypes instead * Inspiration for *.in and *dat files can be found in /common/sw/alarik/pkg/bio/AMBER/Amber14/dat/leap/parm/ /common/sw/alarik/pkg/bio/AMBER/Amber14/dat/leap/prep/